Conversation
There was an implicit conversion to float64 taking place due to a cast
to the Python's list (and hence Python's float). This resulted in a
serious (factor ~10) degradation in performance, which should now be
fixed.
The performance degradation was a result of temporary allocation
performed by pandas when a dtype of a frame was implicitly changed
in updates of the form e.g.:
```python
import pandas as pd
import numpy as np
df = pd.DataFrame({"f32": np.ones(2, dtype="float32")})
df.iloc[1:2] = np.float64(1/3)
```
since pandas 2.1, such operations raise a FutureWarning. All occurences
of that warning in DEMENTpy are resolved in this commit.
62fd4ad to
1499480
Compare
| choose_taxa = np.zeros((self.n_taxa,self.gridsize), dtype='int8') | ||
| for i in range(self.n_taxa): | ||
| choose_taxa[i,:] = np.random.choice([1,0], self.gridsize, replace=True, p=[frequencies[i], 1-frequencies[i]]) | ||
| choose_taxa[i,:] = np.random.binomial(1, frequencies[i], self.gridsize) |
Contributor
Author
There was a problem hiding this comment.
When I changed the numbers back to "float32" this sampling started failing saying that the probabilities do not sum to 1.0. I presume they must be converted to "float64" before getting summed inside np.random.random_choice.
I have changed the sampling to binominal, which i believe should be equivalent and would avoid the summation errors.
| [Enzyme_Loss, | ||
| Enzyme_Loss.mul(self.Enz_Attrib['N_cost'].tolist()*self.gridsize,axis=0), | ||
| Enzyme_Loss.mul(self.Enz_Attrib['P_cost'].tolist()*self.gridsize,axis=0)], | ||
| Enzyme_Loss.mul(np.repeat(self.Enz_Attrib['N_cost'].values, self.gridsize), axis=0), |
Contributor
Author
There was a problem hiding this comment.
The 'float64' was appearing here.
tolist method would return a list of Python's floats, which are double precision.
jgwalkup
approved these changes
Nov 12, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on #34, related to #22
There was an implicit conversion to float64 taking place due to a cast to the Python's list (and hence Python's float). This resulted in a serious (factor ~10) degradation in performance, which should now be fixed.
The problem was ultimately related to how
pandashandles partial assignments inDataFrames andSeries. If the data is assigned to from a higher precision expression, thedtypeof the Series may be implicitly changed, which causes a temporary memory allocation. For example:Please note that the behaviour is a bit peculiar and perhaps unintuitive. Pandas will not change
dtypeif it is "not necessary", that is the new value can be represented exactly in the prior dtype. e.g. :The implicit conversion to float64 was causing some
DataFrames to change their dtype to "float64", which in turn causedMax_Uptake(ingrid.uptakemethod) getting repeatability changed from "float32" to "float64", which appears to have been the main performance bottleneck.Tagging @bioatmosphere explicitly since you were interested during the meeting yesterday ;-)